刻板印象,偏见和歧视已在机器学习(ML)方法(例如计算机视觉(CV)[18,80],自然语言处理(NLP)[6]或两者兼有大图像和大图像和两者兼而有之)标题模型,例如OpenAI剪辑[14]。在本文中,我们评估了ML偏差如何在世界内部和自主作用的机器人中表现出来。我们审核了最近发表的几种剪贴式机器人操纵方法之一,向其呈现在表面上有人脸的图片,这些物体在种族和性别之间各不相同,以及包含与常见刻板印象相关的术语的任务说明。我们的实验明确表明机器人对性别,种族和科学持有的较大的构成观念的作用,并大规模地划分了。此外,经过审核的方法不太可能认识有色人种和有色人种。我们的跨学科社会技术分析跨越了科学技术与社会(STS),批判性研究,历史,安全,机器人技术和AI等领域和应用。我们发现,由大型数据集和溶解模型提供动力的机器人(有时称为“基础模型”,例如剪辑),其中包含人类风险在物理上放大恶性刻板印象;而且,仅纠正差异将不足以使问题的复杂性和规模不足。取而代之的是,我们建议机器人学习方法在适当的时候暂停,重新设计甚至损坏,直到结果被证明是安全,有效和公正的,才能暂停,重新工作甚至损坏其他有害结果。最后,我们讨论了有关身份安全评估框架和设计正义等主题的新的跨学科研究的全面政策变化,以及更好地理解和解决这些危害的主题。
translated by 谷歌翻译
作为自治机器人的互动和导航在诸如房屋之类的真实环境中,可靠地识别和操纵铰接物体,例如门和橱柜是有用的。在对象铰接识别中许多先前的作品需要通过机器人或人类操纵物体。虽然最近的作品已经解决了从视觉观测的预测,但他们经常假设根据其运动约束的铰接部件移动的类别级运动模型或观察序列的先验知识。在这项工作中,我们提出了Formnet,是一种神经网络,该神经网络识别来自RGB-D图像和分段掩模的单帧对象部分的对象部分之间的铰接机制。从6个类别的149个铰接对象的100K合成图像培训网络培训。通过具有域随机化的光保护模拟器呈现合成图像。我们所提出的模型预测物体部件的运动残余流动,并且这些流量用于确定铰接类型和参数。该网络在训练有素的类别中的新对象实例上实现了82.5%的铰接式分类精度。实验还展示了该方法如何实现新颖类别的泛化,并且在没有微调的情况下应用于现实世界图像。
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译
The utilization of large-scale distributed renewable energy promotes the development of the multi-microgrid (MMG), which raises the need of developing an effective energy management method to minimize economic costs and keep self energy-sufficiency. The multi-agent deep reinforcement learning (MADRL) has been widely used for the energy management problem because of its real-time scheduling ability. However, its training requires massive energy operation data of microgrids (MGs), while gathering these data from different MGs would threaten their privacy and data security. Therefore, this paper tackles this practical yet challenging issue by proposing a federated multi-agent deep reinforcement learning (F-MADRL) algorithm via the physics-informed reward. In this algorithm, the federated learning (FL) mechanism is introduced to train the F-MADRL algorithm thus ensures the privacy and the security of data. In addition, a decentralized MMG model is built, and the energy of each participated MG is managed by an agent, which aims to minimize economic costs and keep self energy-sufficiency according to the physics-informed reward. At first, MGs individually execute the self-training based on local energy operation data to train their local agent models. Then, these local models are periodically uploaded to a server and their parameters are aggregated to build a global agent, which will be broadcasted to MGs and replace their local agents. In this way, the experience of each MG agent can be shared and the energy operation data is not explicitly transmitted, thus protecting the privacy and ensuring data security. Finally, experiments are conducted on Oak Ridge national laboratory distributed energy control communication lab microgrid (ORNL-MG) test system, and the comparisons are carried out to verify the effectiveness of introducing the FL mechanism and the outperformance of our proposed F-MADRL.
translated by 谷歌翻译
This paper presents a safety-critical locomotion control framework for quadrupedal robots. Our goal is to enable quadrupedal robots to safely navigate in cluttered environments. To tackle this, we introduce exponential Discrete Control Barrier Functions (exponential DCBFs) with duality-based obstacle avoidance constraints into a Nonlinear Model Predictive Control (NMPC) with Whole-Body Control (WBC) framework for quadrupedal locomotion control. This enables us to use polytopes to describe the shapes of the robot and obstacles for collision avoidance while doing locomotion control of quadrupedal robots. Compared to most prior work, especially using CBFs, that utilize spherical and conservative approximation for obstacle avoidance, this work demonstrates a quadrupedal robot autonomously and safely navigating through very tight spaces in the real world. (Our open-source code is available at github.com/HybridRobotics/quadruped_nmpc_dcbf_duality, and the video is available at youtu.be/p1gSQjwXm1Q.)
translated by 谷歌翻译
Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two-step framework to detect vertebral structures in 3D ultrasound volume by utilizing unlabeled data in semi-supervised manner. The first step is to detect the possible positions of structures on transverse slice globally, and then the local patches are cropped based on detected positions. The second step is to distinguish whether the patches contain real vertebral structures and screen the predicted positions from the first step. VertMatch develops three novel components for semi-supervised learning: for position detection in the first step, (1) anatomical prior is used to screen pseudo labels generated from confidence threshold method; (2) multi-slice consistency is used to utilize more unlabeled data by inputting multiple adjacent slices; (3) for patch identification in the second step, the categories are rebalanced in each batch to solve imbalance problem. Experimental results demonstrate that VertMatch can detect vertebra accurately in ultrasound volume and outperforms state-of-the-art methods. VertMatch is also validated in clinical application on forty ultrasound scans, and it can be a promising approach for 3D assessment of scoliosis.
translated by 谷歌翻译
Free-text rationales (FTRs) follow how humans communicate by explaining reasoning processes via natural language. A number of recent works have studied how to improve language model (LM) generalization by using FTRs to teach LMs the correct reasoning processes behind correct task outputs. These prior works aim to learn from FTRs by appending them to the LM input or target output, but this may introduce an input distribution shift or conflict with the task objective, respectively. We propose KNIFE, which distills FTR knowledge from an FTR-augmented teacher LM (takes both task input and FTR) to a student LM (takes only task input), which is used for inference. Crucially, the teacher LM's forward computation has a bottleneck stage in which all of its FTR states are masked out, which pushes knowledge from the FTR states into the task input/output states. Then, FTR knowledge is distilled to the student LM by training its task input/output states to align with the teacher LM's. On two question answering datasets, we show that KNIFE significantly outperforms existing FTR learning methods, in both fully-supervised and low-resource settings.
translated by 谷歌翻译
Time series anomaly detection strives to uncover potential abnormal behaviors and patterns from temporal data, and has fundamental significance in diverse application scenarios. Constructing an effective detection model usually requires adequate training data stored in a centralized manner, however, this requirement sometimes could not be satisfied in realistic scenarios. As a prevailing approach to address the above problem, federated learning has demonstrated its power to cooperate with the distributed data available while protecting the privacy of data providers. However, it is still unclear that how existing time series anomaly detection algorithms perform with decentralized data storage and privacy protection through federated learning. To study this, we conduct a federated time series anomaly detection benchmark, named FedTADBench, which involves five representative time series anomaly detection algorithms and four popular federated learning methods. We would like to answer the following questions: (1)How is the performance of time series anomaly detection algorithms when meeting federated learning? (2) Which federated learning method is the most appropriate one for time series anomaly detection? (3) How do federated time series anomaly detection approaches perform on different partitions of data in clients? Numbers of results as well as corresponding analysis are provided from extensive experiments with various settings. The source code of our benchmark is publicly available at https://github.com/fanxingliu2020/FedTADBench.
translated by 谷歌翻译
To alleviate the data scarcity problem in End-to-end speech translation (ST), pre-training on data for speech recognition and machine translation is considered as an important technique. However, the modality gap between speech and text prevents the ST model from efficiently inheriting knowledge from the pre-trained models. In this work, we propose AdaTranS for end-to-end ST. It adapts the speech features with a new shrinking mechanism to mitigate the length mismatch between speech and text features by predicting word boundaries. Experiments on the MUST-C dataset demonstrate that AdaTranS achieves better performance than the other shrinking-based methods, with higher inference speed and lower memory usage. Further experiments also show that AdaTranS can be equipped with additional alignment losses to further improve performance.
translated by 谷歌翻译
Point clouds are characterized by irregularity and unstructuredness, which pose challenges in efficient data exploitation and discriminative feature extraction. In this paper, we present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology as a completely regular 2D point geometry image (PGI) structure, in which coordinates of spatial points are captured in colors of image pixels. \mr{Intuitively, Flattening-Net implicitly approximates a locally smooth 3D-to-2D surface flattening process while effectively preserving neighborhood consistency.} \mr{As a generic representation modality, PGI inherently encodes the intrinsic property of the underlying manifold structure and facilitates surface-style point feature aggregation.} To demonstrate its potential, we construct a unified learning framework directly operating on PGIs to achieve \mr{diverse types of high-level and low-level} downstream applications driven by specific task networks, including classification, segmentation, reconstruction, and upsampling. Extensive experiments demonstrate that our methods perform favorably against the current state-of-the-art competitors. We will make the code and data publicly available at https://github.com/keeganhk/Flattening-Net.
translated by 谷歌翻译